Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kafka): Replay kafka from last commit before becoming ready #14330

Merged
merged 8 commits into from
Oct 4, 2024

Conversation

benclive
Copy link
Contributor

@benclive benclive commented Oct 1, 2024

What this PR does / why we need it:

This stops a Kafka-ingester from becoming "ready" to serve requests before it has replayed all the latest data from Kafka.

  • This implementation is heavily inspired / copied from Mimir's
  • The criteria for becoming ready can be controlled by two parameters: target lag and max lag. If we take too long to replay the data, we will make sure to hit max lag, but most of the time we will hit target lag.
  • Since we are using a WAL, we expect that all data prior to the last commit on the consumer group at the ingester's partition is safely stored & replayed by the WAL mechanism. Therefore, we only continue to consume data from the last commit.
    • Each ingester instance has it's own consumer group to ensure this behaviour.

Which issue(s) this PR fixes:
Fixes https://github.com/grafana/loki-private/issues/1119

Special notes for your reviewer:
I didn't explicitly test the replay & multiple attempts functions yet. I couldn't test it via the existing test framework without injecting time delays into the test kafka. I'll investigate adding some more tests for the function directly.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@benclive benclive requested a review from a team as a code owner October 1, 2024 11:30
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Oct 1, 2024
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I like this approach. Let's test it.

pkg/kafka/config.go Outdated Show resolved Hide resolved
pkg/kafka/config.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Show resolved Hide resolved
pkg/kafka/config.go Outdated Show resolved Hide resolved
pkg/kafka/config.go Outdated Show resolved Hide resolved
pkg/kafka/config.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Outdated Show resolved Hide resolved
pkg/kafka/partition/reader.go Show resolved Hide resolved
@benclive benclive merged commit 39b57ec into main Oct 4, 2024
61 checks passed
@benclive benclive deleted the benclive/implement-kafka-startup-catchup-logic branch October 4, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants